-
Cytometry Feb 2001A brief history of numbers and statistics traces the development of numbers from prehistory to completion of our current system of numeration with the introduction of... (Review)
Review
A brief history of numbers and statistics traces the development of numbers from prehistory to completion of our current system of numeration with the introduction of the decimal fraction by Viete, Stevin, Burgi, and Galileo at the turn of the 16th century. This was followed by the development of what we now know as probability theory by Pascal, Fermat, and Huygens in the mid-17th century which arose in connection with questions in gambling with dice and can be regarded as the origin of statistics. The three main probability distributions on which statistics depend were introduced and/or formalized between the mid-17th and early 19th centuries: the binomial distribution by Pascal; the normal distribution by de Moivre, Gauss, and Laplace, and the Poisson distribution by Poisson. The formal discipline of statistics commenced with the works of Pearson, Yule, and Gosset at the turn of the 19th century when the first statistical tests were introduced. Elementary descriptions of the statistical tests most likely to be used in conjunction with cytometric data are given and it is shown how these can be applied to the analysis of difficult immunofluorescence distributions when there is overlap between the labeled and unlabeled cell populations.
Topics: Cell Count; Computers; Fluorescent Antibody Technique; Mathematics; Statistics as Topic
PubMed: 11241502
DOI: 10.1002/1097-0320(20010215)46:1<1::aid-cyto1032>3.0.co;2-3 -
Frontiers in Neurology 2022Calculating the crude or adjusted annualized relapse rate (ARR) and its confidence interval (CI) is often required in clinical studies to evaluate chronic relapsing...
Calculating the crude or adjusted annualized relapse rate (ARR) and its confidence interval (CI) is often required in clinical studies to evaluate chronic relapsing diseases, such as multiple sclerosis and neuromyelitis optica spectrum disorders. However, accurately calculating ARR and estimating the 95% CI requires careful application of statistical approaches and basic familiarity with the exponential family of distributions. When the relapse rate can be regarded as constant over time or by individuals, the crude ARR can be calculated using the person-years method, which divides the number of all observed relapses among all participants by the total follow-up period of the study cohort. If the number of relapses can be modeled by the Poisson distribution, the 95% CI of ARR can be obtained by finding the 2.5% upper and lower critical values of the parameter λ as the mean. Basic familiarity with F-statistics is also required when comparing the ARR between two disease groups. It is necessary to distinguish the observed relapse rate ratio (RR) between two sample groups (sample RR) from the unobserved RR between their originating populations (population RR). The ratio of population RR to sample RR roughly follows the F distribution, with degrees of freedom obtained by doubling the number of observed relapses in the two sample groups. Based on this, a 95% CI of the population RR can be estimated. When the count data of the response variable is overdispersed, the negative binomial distribution would be a better fit than the Poisson. Adjusted ARR and the 95% CI can be obtained by using the generalized linear regression models after selecting appropriate error structures (e.g., Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial) according to the overdispersion and zero-inflation in the response variable.
PubMed: 35756930
DOI: 10.3389/fneur.2022.875456 -
Statistics in Medicine Jul 2022Provider profiling has been recognized as a useful tool in monitoring health care quality, facilitating inter-provider care coordination, and improving medical...
Provider profiling has been recognized as a useful tool in monitoring health care quality, facilitating inter-provider care coordination, and improving medical cost-effectiveness. Existing methods often use generalized linear models with fixed provider effects, especially when profiling dialysis facilities. As the number of providers under evaluation escalates, the computational burden becomes formidable even for specially designed workstations. To address this challenge, we introduce a serial blockwise inversion Newton algorithm exploiting the block structure of the information matrix. A shared-memory divide-and-conquer algorithm is proposed to further boost computational efficiency. In addition to the computational challenge, the current literature lacks an appropriate inferential approach to detecting providers with outlying performance especially when small providers with extreme outcomes are present. In this context, traditional score and Wald tests relying on large-sample distributions of the test statistics lead to inaccurate approximations of the small-sample properties. In light of the inferential issue, we develop an exact test of provider effects using exact finite-sample distributions, with the Poisson-binomial distribution as a special case when the outcome is binary. Simulation analyses demonstrate improved estimation and inference over existing methods. The proposed methods are applied to profiling dialysis facilities based on emergency department encounters using a dialysis patient database from the Centers for Medicare & Medicaid Services.
Topics: Aged; Health Personnel; Humans; Medicare; Quality of Health Care; United States
PubMed: 35318706
DOI: 10.1002/sim.9387 -
Archives of Disease in Childhood Feb 1993
Review
Topics: Analysis of Variance; Binomial Distribution; Discriminant Analysis; Predictive Value of Tests; Statistics as Topic
PubMed: 8481051
DOI: 10.1136/adc.68.2.246 -
BMC Medical Research Methodology Jan 2019Health economic models are critical tools to inform reimbursement agencies on health care interventions. Many clinical trials report outcomes using the frequency of an...
BACKGROUND
Health economic models are critical tools to inform reimbursement agencies on health care interventions. Many clinical trials report outcomes using the frequency of an event over a set period of time, for example, the primary efficacy outcome in most clinical trials of migraine prevention is mean change in the frequency of migraine days (MDs) per 28 days (monthly MDs [MMD]) relative to baseline for active treatment versus placebo. Using these cohort-level endpoints in economic models, accounting for variation among patients is challenging. In this analysis, parametric models of change in MMD for migraine preventives were assessed using data from erenumab clinical studies.
METHODS
MMD observations from the double-blind phases of two studies of erenumab were used: one in episodic migraine (EM) (NCT02456740) and one in chronic migraine (CM) (NCT02066415). For each trial, two longitudinal regression models were fitted: negative binomial and beta binomial. For a thorough comparison we also present the fitting from the standard multilevel Poisson and the zero inflated negative binomial.
RESULTS
Using the erenumab study data, both the negative binomial and beta-binomial models provided unbiased estimates relative to observed trial data with well-fitting distribution at various time points.
CONCLUSIONS
This proposed methodology, which has not been previously applied in migraine, has shown that these models may be suitable for estimating MMD frequency. Modelling MMD using negative binomial and beta-binomial distributions can be advantageous because these models can capture intra- and inter-patient variability so that trial observations can be modelled parametrically for the purposes of economic evaluation of migraine prevention. Such models have implications for use in a wide range of disease areas when assessing repeated measured utility values.
Topics: Antibodies, Monoclonal, Humanized; Binomial Distribution; Calcitonin Gene-Related Peptide Receptor Antagonists; Data Interpretation, Statistical; Humans; Migraine Disorders; Models, Statistical; Time Factors
PubMed: 30674285
DOI: 10.1186/s12874-019-0664-5 -
Bernoulli : Official Journal of the... Aug 2017We establish exponential bounds for the hypergeometric distribution which include a finite sampling correction factor, but are otherwise analogous to bounds for the...
We establish exponential bounds for the hypergeometric distribution which include a finite sampling correction factor, but are otherwise analogous to bounds for the binomial distribution due to León and Perron ( (2003) 345-354) and Talagrand ( (1994) 28-76). We also extend a convex ordering of Kemperman's ( (1973) 149-164) for sampling without replacement from populations of real numbers between zero and one: a population of all zeros or ones (and hence yielding a hypergeometric distribution in the upper bound) gives the extreme case.
PubMed: 29520197
DOI: 10.3150/15-BEJ800 -
BMC Medical Research Methodology Apr 2015Sample size calculations should correspond to the intended method of analysis. Nevertheless, for non-normal distributions, they are often done on the basis of normal...
BACKGROUND
Sample size calculations should correspond to the intended method of analysis. Nevertheless, for non-normal distributions, they are often done on the basis of normal approximations, even when the data are to be analysed using generalized linear models (GLMs).
METHODS
For the case of comparison of two means, we use GLM theory to derive sample size formulae, with particular cases being the negative binomial, Poisson, binomial, and gamma families. By simulation we estimate the performance of normal approximations, which, via the identity link, are special cases of our approach, and for common link functions such as the log. The negative binomial and gamma scenarios are motivated by examples in hookworm vaccine trials and insecticide-treated materials, respectively.
RESULTS
Calculations on the link function (log) scale work well for the negative binomial and gamma scenarios examined and are often superior to the normal approximations. However, they have little advantage for the Poisson and binomial distributions.
CONCLUSIONS
The proposed method is suitable for sample size calculations for comparisons of means of highly skewed outcome variables.
Topics: Algorithms; Binomial Distribution; Computer Simulation; Humans; Linear Models; Models, Theoretical; Sample Size
PubMed: 25886883
DOI: 10.1186/s12874-015-0023-0 -
Annals of Internal Medicine Oct 2021Despite expected initial universal susceptibility to a novel pandemic pathogen like SARS-CoV-2, the pandemic has been characterized by higher observed incidence in older...
BACKGROUND
Despite expected initial universal susceptibility to a novel pandemic pathogen like SARS-CoV-2, the pandemic has been characterized by higher observed incidence in older persons and lower incidence in children and adolescents.
OBJECTIVE
To determine whether differential testing by age group explains observed variation in incidence.
DESIGN
Population-based cohort study.
SETTING
Ontario, Canada.
PARTICIPANTS
Persons diagnosed with SARS-CoV-2 and those tested for SARS-CoV-2.
MEASUREMENTS
Test volumes from the Ontario Laboratories Information System, number of laboratory-confirmed SARS-CoV-2 cases from the Integrated Public Health Information System, and population figures from Statistics Canada. Demographic and temporal patterns in incidence, testing rates, and test positivity were explored using negative binomial regression models and standardization. Sources of variation in standardized ratios were identified and test-adjusted standardized infection ratios (SIRs) were estimated by metaregression.
RESULTS
Observed disease incidence and testing rates were highest in the oldest age group and markedly lower in those younger than 20 years; no differences in incidence were seen by sex. After adjustment for testing frequency, SIRs were lowest in children and in adults aged 70 years or older and markedly higher in adolescents and in males aged 20 to 49 years compared with the overall population. Test-adjusted SIRs were highly correlated with standardized positivity ratios (Pearson correlation coefficient, 0.87 [95% CI, 0.68 to 0.95]; < 0.001) and provided a case identification fraction similar to that estimated with serologic testing (26.7% vs. 17.2%).
LIMITATIONS
The novel methodology requires external validation. Case and testing data were not linkable at the individual level.
CONCLUSION
Adjustment for testing frequency provides a different picture of SARS-CoV-2 infection risk by age, suggesting that younger males are an underrecognized group at high risk for SARS-CoV-2 infection.
PRIMARY FUNDING SOURCE
Canadian Institutes of Health Research.
Topics: Adolescent; Adult; Age Distribution; Aged; Aged, 80 and over; Binomial Distribution; COVID-19; COVID-19 Testing; Child; Child, Preschool; Female; Humans; Incidence; Infant; Infant, Newborn; Male; Middle Aged; Ontario; Pandemics; SARS-CoV-2; Sex Distribution; Young Adult
PubMed: 34399059
DOI: 10.7326/M20-7003 -
Journal of Applied Statistics 2020In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect...
In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect to exogenous covariates. Apart from the classical Poisson, negative binomial and generalised Poisson distributions, many proposals have appeared in the statistical literature, perhaps in response to the new possibilities offered by advanced software that now enables researchers to implement numerous special functions in a relatively simple way. However, we believe that a significant research gap remains, since very little attention has been paid to the quasi-binomial distribution, which was first proposed over fifty years ago. We believe this distribution might constitute a valid alternative to existing regression models, in situations in which the variable has bounded support. Therefore, in this paper we present a zero-inflated regression model based on the quasi-binomial distribution, taking into account the moments and maximum likelihood estimators, and perform a score test to compare the zero-inflated quasi-binomial distribution with the zero-inflated binomial distribution, and the zero-inflated model with the homogeneous model (the model in which covariates are not considered). This analysis is illustrated with two data sets that are well known in the statistical literature and which contain a large number of zeros.
PubMed: 35706839
DOI: 10.1080/02664763.2019.1707517 -
BMC Bioinformatics Jan 2022Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled...
Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html .
Topics: Binomial Distribution; Humans; Neoplasms; Research Design; Sample Size; Software
PubMed: 34983369
DOI: 10.1186/s12859-021-04526-5